CarveML: application of machine learning to file fragment classification
نویسنده
چکیده
We present a learning algorithmic approach to the problem of recognzing the file types of file fragments, with the purpose of applying this to “file carving”, the reconstruction of partially erased files on disk into whole files. We do so through the use of 257 calculated features of an input fragment, applying the Support Vector Machine, Multinomial Naive Bayes, and Linear Discriminant Analysis models to our problem to see which produces the most accurate method of classification.
منابع مشابه
Machine Learning and Citizen Science: Opportunities and Challenges of Human-Computer Interaction
Background and Aim: In processing large data, scientists have to perform the tedious task of analyzing hefty bulk of data. Machine learning techniques are a potential solution to this problem. In citizen science, human and artificial intelligence may be unified to facilitate this effort. Considering the ambiguities in machine performance and management of user-generated data, this paper aims to...
متن کاملA File Fragment Classification Method Based on Grayscale Image
File fragment classification is an important and difficult problem in digital forensics. Previous works in this area mainly relied on specific byte sequences in file headers and footers, or statistical analysis and machine learning algorithms on data from the middle of the file. This paper introduces a new approach to classify file fragment based on grayscale image. The proposed method treats a...
متن کاملAPPLICATION OF THE HYBRID HARMONY SEARCH WITH SUPPORT VECTOR MACHINE FOR IDENTIFICATION AND CALSSIFICATION OF DAMAGED ZONE AROUND UNDERGROUND SPACES
An excavation damage zone (EDZ) can be defined as a rock zone where the rock properties and conditions have been changed due to the processes related to an excavation. This zone affects the behavior of rock mass surrounding the construction that reduces the stability and safety factor and increase probability of failure of the structure. This paper presents an approach to build a model for the ...
متن کاملA Hybrid Machine Learning Method for Intrusion Detection
Data security is an important area of concern for every computer system owner. An intrusion detection system is a device or software application that monitors a network or systems for malicious activity or policy violations. Already various techniques of artificial intelligence have been used for intrusion detection. The main challenge in this area is the running speed of the available implemen...
متن کاملFast Content-Based File Type Identification
Digital forensic examiners often need to identify the type of a file or file fragment based on the content of the file. Content-based file type identification schemes typically use a byte frequency distribution with statistical machine learning to classify file types. Most algorithms analyze the entire file content to obtain the byte frequency distribution, a technique that is inefficient and t...
متن کامل